Fairness-aware recommendation with meta learning

Fairness has become a critical value online, and the latest studies consider it in many problems. In recommender systems, fairness is important since the visibility of items is controlled by systems. Previous fairness-aware recommender systems assume that sufficient relationship data between users and items are available. However, it is common that new users and items are frequently introduced, and they have no relationship data yet. In this paper, we study recommendation methods to enhance fairness in a cold-start state. Fairness is more significant when the preference of a user or the popularity of an item is unknown. We propose a meta-learning-based cold-start recommendation framework called FaRM to alleviate the unfairness of recommendations. The proposed framework consists of three steps. We first propose a fairness-aware meta-path generation method to eliminate bias in sensitive attributes. In addition, we construct fairness-aware user representations through the meta-path aggregation approach. Then, we propose a novel fairness objective function and introduce a joint learning method to minimize the trade-off between relevancy and fairness. In extensive experiments with various cold-start scenarios, it is shown that FaRM is significantly superior in fairness performance while preserving relevance accuracy over previous work.

start problem by recommending popular items or recommending items preferred by users with the same user characteristics (gender, age, occupation, etc.).However, it is essential to mitigate data bias in these attributes, such as gender bias, as shown in Fig. 1.This is why enhancing fairness in cold-start is more important than improving fairness in warm-start.Zhu et al. 20 captured the importance of unbiased recommendations for new items and proposed a learnable framework that eliminates popularity bias in the item cold-start scenario.However, the framework might have a limitation only considering the popularity bias of individual items and ignoring bias in sensitive attributes such as gender.This limitation leads to some problems of recommending unwanted items to new users only with demographic information, not feedback data.As shown in Fig. 3a, the recommendation Figure 1.Gender distribution by genre of each user's preference for the Movielens 1M dataset.We define each user's preferred genre as a genre with a proportion of more than 10% of the movies with which users interacted.algorithm, unaware of fairness, recommends Romance movies to new female users who prefer War and Sci-Fi without removing the gender bias.The problem is that it takes a considerable amount of time for the system to learn the flavor of the new user, which may eventually lead to the churn of the new user.In contrast, the fairnessaware recommender system in Fig. 3b without gender bias improves user satisfaction by recommending Sci-Fi and War films that the user likes as soon as they use the platform.Therefore, fairness for sensitive attributes in the cold-start state is vital in keeping new users and increasing heavy users.
This paper proposes a novel framework called FaRM (Fairness-aware Recommendation with Meta-learning), which reduces bias for sensitive attributes of users or items and can also adapt to cold-start states.Previous works related to meta-learning-based recommendations 9,10 have alleviated the cold-start problem, but the unfairness problem remains unresolved.Our study aims to enhance the fairness of the meta-learning-based recommendation framework to overcome the limitations of previous works.

Contributions
The key contributions of our work can be summarized as: • It is the first attempt to improve the fairness of the cold start recommendation model, which recommends items to new users reasonably.• We propose a novel fairness-aware framework named FaRM, which enhances fairness in a meta-learning- based model.We introduce a novel meta-path generation method that improves fairness through the fairnessaware random walker.We also investigate joint training techniques for minimizing the trade-off between relevance and fairness.
• Extensive experiments demonstrate that FaRM enhances fairness in cold-start scenarios and significantly outperforms various state-of-the-art methods.
The remainder of this paper has the following structure.We discuss existing works on meta-learning and fairness related to FaRM in section "Related work" and formalize the problem of FaRM in section "Problem definition".
In section "Methodology", we present the proposed recommendation framework, FaRM, and introduce new methods that have introduced fairness to the meta-learning-based cold-start recommender systems.In section "Experiments", we experimentally evaluate the proposed model.Finally, We conclude our findings and discuss future research in section "Conclusion".

Cold-start recommendation
Sparse user-item interaction for new users and items (i.e., cold-start recommendation) is one of the challenging problems in collaborative filtering 2,5,21 .Early research focused on content-based filtering 22 , which uses metadata from users and items to solve the cold-start problem.Shi et al. 23 alleviated the cold-start problem by introducing heterogeneous information networks (HINs) 24 that embed multiple meta-paths to improve the quality of contents.
The success of meta-learning 25 , which can learn with even a small amount of data, has contributed significantly to solving the cold start problem.Vartak et al. 26 solved the item cold-start problem by introducing metricbased few-shot learning on recommendation tasks to adapt to new items.Lee et al. 9 proposed a recommendation framework that improves performance in various cold-start scenarios by applying an optimization-based approach, MAML 27 .Moreover, Lu et al. 10 proposed a method to solve cold-start problems at both data-level and model-level by applying HIN 23,24 to the MAML framework.Despite several investigations that reducing bias in cold-start is essential 19,28 , these methods did not consider fairness or de-baising.Therefore, we aim to improve the quality of recommendations by reducing bias for sensitive attributes and improving fairness in the MAML framework 9,10,27 .

Fair meta-learning
Fairness has become an indispensable problem in machine learning in recent years 13,29 .A small amount of research has recently begun to improve fairness in meta-learning approaches [30][31][32] .Slack et al. 31 proposed a fairness-aware online meta-learning framework by adding fairness constraints based on decision boundary covariance (DBC) 33 .Similarly, Zhao et al. 32 applied fairness-aware constraints to the few-shot image classification task.In addition, Slack et al. 30 proposed two kinds of fairness regularizers and improved the fairness of the MAML framework 27 by joint training 34 between the accuracy loss and the fairness regularizer.However, these approaches focused only on general classification tasks, not recommendation tasks.This paper proposes a novel fairness regularizer suitable for the rating prediction task to reduce bias between different item groups in MAML-based recommender systems 9,10 .

Fairness-aware recommendation
Fairness has begun to be studied in recommender systems because unfair recommendations can cause fatal damage to users or platforms 11,16,18,20,35 .The fairness in recommendation tasks can be categorized as user-side (i.e., consumer-side) and item-side (i.e., provider-side) fairness 14,36 .In the item-side study, Abdollahpouri et al. 37 analyzed the impact of popularity bias on different individuals or groups of users.Furthermore, Biega et al. 38 formalized equity-of-attention fairness that captures the difference between the deserved and received attention in post-processing.Meanwhile, Yao et al. 39 provided four fairness metrics for group-level fairness on the user-side.Li et al. 40 provided a fairness constructed re-ranking method to enhance the fairness of different user groups.Islam et al. 15 proposed a novel fair recommendation network by applying two de-biasing methods for user embeddings to neural collaborative filtering (NCF) 21 .In addition, fairness works have also been proposed from various perspectives, such as multi-side fairness 41,42 , adversarial learning 11,18 , HIN representation learning 17 , re-ranking [43][44][45] , and in-processing methods 46,47 .
Unfortunately, these methods address fairness in warm-start with existing users and items rather than coldstart.Zhu et al. 20 captured this limitation of existing works and proposed a learnable re-ranking framework that strengthens fairness in cold-start.However, this framework desires to reduce only the item popularity bias while overlooking the bias for the user's sensitive attributes.To overcome these limitations of previous works, we aim to de-bias the sensitive attributes by improving the fairness of user-oriented meta-learning tasks 9,10 .

Problem definition
In this section, we introduce the problem definition of FaRM.This paper is inspired by the HIN-based recommendation models and the definition of HIN is as follows 10,23,24 .
Definition 1 Heterogeneous information network.We suppose that our dataset is a heterogeneous information network G = (V , E) where V denotes the set of nodes and E denotes the set of links.A network is associated with a node type mapping function φ : V → A and a link type mapping function ϕ : E → R , where A denotes the set of node types and R denotes the set of link types, where We propose a novel algorithm to generate a fair meta-path in section "Methodology", and meta-path is defined as follows 10 .
Definition 2 Meta-path.We define a meta-path p , which generates node sequences, as a path in the form of p = a 1 , where l denotes the length of p , each a i ∈ A and r i ∈ R.
We define sensitive attributes for users or items such as gender as follows.
Definition 3 Sensitive attributes.A sensitive attribute mapping function is defined as � : (V , A) → S , where S denotes the set of sensitive attributes.

Methodology
In this section, we introduce a novel fairness-aware recommendation framework, FaRM.Furthermore, we propose various fairness-aware methods of FaRM.

Overall framework of FaRM
In Fig. 4, it is shown the overall structure of the MAML-based fairness-aware recommendation framework, FaRM, proposed in this paper.Given the set of users U = {u 1 , u 2 , . . .u N } and the set of items I = {i 1 , i 2 , . . .i M } , www.nature.com/scientificreports/ the task for each user u is defined as , where S u denotes the support set of user u and Q u denotes the query set of user u.For each task, the procedure shown in Fig. 4 is performed.First, a fairness-aware random walker creates de-baised meta-paths for a target user u.Second, we convert each of the meta-paths into a dense representation.Then, the meta-path aggregator aggregates two types of meta-paths to create a dense user representation x u that enters the recommendation model f as input.Finally, the proposed framework learns the model through joint training 34 for fairness and relevance objectives.

Fairness-aware random walker
Several existing works have employed a random walk 48 to construct meta-paths 49,50 .However, We propose a transition probability for a fair random walk that fairly generates the next node because the random walk cannot capture the bias of the sensitive attributes.
As shown in Fig. 5, the proposed algorithm generates two types of meta-paths.The type of meta-path P consists of UM(User − Movie) and UMUM(User − Movie − User − Movie) , where UM encodes the context of "movies rated by the user", and UMUM means the context of "movies rated by another user who has seen the same movie".The transition probability of a random walker is defined as follows, where v i+1 is a neighbor node of v i , a i and a i+1 are sensitive attributes of v i and v i+1 respectively, and a i = a i+1 .For the convenience of explanation, it is assume that A in Definition 2 is A = {User(U), Movie(M)} , where a i , a i+1 ∈ A .In Addition, the sensitive attribute mapping function in Definition 3 returns a gender value (male or female) if the node type is User(U) and returns genre values (Romance, Action, Thriller, etc.) if the node type is Item(I) For example, If a i is the node type User, �(v i , a i ) can be the type of gender, such as male or female.Similarly, if a i is the node type Movie, �(v i , a i ) can be the type of genre, such as Romance or Action.Equation (1) allows us to select more nodes for disadvantaged groups and fewer nodes for advantaged groups.
Example 1 Suppose we are considering a meta path from user A, who is male, to a romance genre movie α .Here, �(v i , a i ) represents the sensitive attribute value of user A, which is 'male' , and �(v i+1 , a i+1 ) represents the sensitive attribute value of movie α , which is 'romance' .The value of P(�(v i+1 , ai + 1)|�(v i , a i )) can be calcu- lated using statistics from the dataset.For example, if this statistical value is calculated from the Movielens 1M dataset 12 , the probability comes out to be 0.1598, and based on this, the transition probability value is calculated as 1 − 0.1598 = 0.8402 .The entire transition probability matrix calculated in this manner from the Movielens 1M dataset is given in Table 3. Unlike previous random walk methodologies that randomly select the next node with equal probability, the fairness aware random walk proposed in the paper selects the next node based on the transition probability and generates a debiased metapath accordingly.
We generate meta-paths P of each node through the following Algorithm 1 using the pre-defined transition probability of Eq. ( 1).
(1) for each random walk step do 4: p ← {v} Initialize a meta-path 5: for i ← 0 to l − 1 do 7: Sample a node from the transition probability 8: end for 11: end for 13: return P 14: end procedure Algorithm 1. Fairness-aware Random Walker Algorithm 1 describes the proposed meta-path generation procedure in detail.We assume that the node type of input node (i.e., first node) v is User.First, the set of meta-paths P is initialized (line 2).Second, a meta-path p is initialized in each step of the random walkers (line 4).Next, the neighboring node v i of v cur is sampled from the transition probability P defined in Eq. ( 1) and enters set p (line 7-8).Finally, the algorithm generates fair meta-paths P .We generate the meta-paths UM and UMUM for each user through Algorithm 1, where l of UM is 2 and l of UMUM is 4.
We generate a dense latent vector for user u as follows 10 , where P t,u is the set of meta-paths with the meta-path type t for user u, W is initialized using Xavier 51 , σ is the activation function, and MEAN(•) is mean pooling.Afterward, we aggregate de-biased dense user representa- tions of each user, which is formulated, where T denotes the set of meta-path types, and a t denotes the weight of the meta-path type t.We set a t to 1/|T| for all t in our experiments in section "Experiments".

Co-regularizer
We present a novel fairness regularizer and design a joint training method 30 to minimize the trade-off between relevance and fairness performance.The proposed fairness regularizer is formulated as the relative standard variance of the average predicted score of each group as follows, where ŷu denotes the set of predicted scores of items for user u, g k means k-th group and is the average predicted score for items belonging to group g k among items rated by user u.The regularizer L F encourages the recommendation model to learn that each user rates items fairly regardless of the group of items.The loss function of relevance is mean squared error 52 , which is formulated as, where I u is the set of items rated by user u, and y ui and ŷui denotes the actual and the predicted score rated by the user u to item i, respectively.We learn the item preference for each user by minimizing the relevance loss function L R .The final loss function L is calculated as follows, where γ is the fairness weight that controls the importance of fairness and 0 ≤ γ ≤ 1.
Fairness-aware meta-learner K-shot fairness 30 for learning from a few data for new tasks aims to: (1) learns both the fairness and accuracy of recommendations quickly at the same time, (2) enables tuning to achieve different balances between accuracy and fairness to minimize the trade-off between performance of both.The task-specific learner learns using the (2) e u,t = σ (MEAN({We j + b : j ∈ P u,t })), where f denotes the recommendation model, and we employ a multi layer perceptron 53 with two layers.
In detail, The local parameter θ i will be optimized through backpropagation of the final loss function for the support set, as follow, Similarly, the global parameter θ will be optimized through backpropagation of the query loss, as follow, The ultimate goal of FaRM is to quickly adapt new users and items to recommendation model f, minimizing degradation of relevance performance and increasing fairness performance.

Experiments
In this section, we demonstrate that our model is superior by comparing it with other baseline models.

Dataset
We experiment using Movielens 1M dataset 12 , a benchmark dataset for recommendation models.Table 1 shows statistics for the Movielens dataset.The dataset contains 6040 users, 3881 movies, and 1,000,209 rating data ranging from 1 to 5. In Table 1, the underlined attributes represent sensitive attributes for users and items.User attributes contain gender, age, occupation, and zip code, and the user-sensitive attribute, gender, is a binary group.Item attributes include genre, publishing year, age group, director, and actor, and the genre is the itemsensitive attribute.
Table 2 shows gender-based statistics for the movie genre, an item-sensitive attribute.We chose six genres with gender imbalances: Romance, Action, Sci-Fi, Musical, Crime, Adventure, and Thriller.The female group rated Romance and Musical movies more than the male group.On the other hand, both the female and male groups rated the Action, Sci-Fi, Crime, Adventure, and Thriller genres a lot, but the male group rated a lot more.Each group's preference is also similar to the average number of ratings.
Similar to existing meta-learning-based studies 9,10 , we eliminate users who rated less than 13 movies or more than 100 movies.We construct the query set Q u by randomly selecting 10 items rated by each user and construct the support set S u with the remaining items.We generate fair meta-paths UM and UMUM through Algorithm 1 for each task T u = (S u , Q u ) , where u ∈ U .The fairness-aware transition probability shown in Table 3 is calcu- lated through Eq. ( 1).
We construct four experimental scenarios to evaluate performance in warm-start and cold-start environments: Warm-start state (WS) with existing users and items, User Cold-start (UC) state with new users and existing items, Item Cold-start (IC) state with existing users and new items, and User-Item Cold-start (UIC) State with new users and new users.We evaluate the performance of the proposed model for four experimental scenarios in section "Performance evaluation", and we assume the user-item cold start (UIC) environment in sections "Model analysis" and "Parameter analysis".( 7)

Evaluation metrics
We adopt relevance and fairness metrics to evaluate FaRM.We use Mean Absolute Error (MAE) and Normalized Discounted Cumulative Gain at rank K (NDCG@K) as relevance metrics, and we set K=5.We use Accuracy and Macro F-Score as fairness metrics 11 , where smaller values denote better fairness performance with less impact on sensitive attributes.The lower the value of these classification metrics, the less influence of sensitive attributes in the learning process.

Compared methods
We compare FaRM with three existing methods: MetaHIN 10 , Random and NFCF 15 .MetaHIN is a model that improves the accuracy of recommendations by introducing a heterogeneous information network to a metalearning-based cold-start recommendation model.Random and NFCF use it as baseline models to evaluate the fairness of FaRM.Random is suitable as a baseline model for comparing fairness performance because it randomly estimates user preferences regardless of sensitive attributes.NFCF is a fairness-aware recommendation model that enhances fairness to Neural Collaborative Filtering (NCF) 21 .

Parameter settings
We adopt Adaptive Moment Estimation (Adam) for optimization, and we set the batch size to 32 and the maximum number of epochs to 100.The model f consists of two fully-connected layers, and we set the hidden dimension of each layer to 64.We construct the embedding vectors for each attribute of the user and item and set the dimension of all embedding vectors to 32.We set both learning rates for local update and global update to 0.001, and set the fairness weight γ in Eq. ( 6) to 0.5.We experiment with the impact of the hyperparameter γ on the performance of FaRM in section "Parameter analysis".

Performance evaluation
We compare FaRM and different comparative models in four experimental scenarios (i.e., WS, UC, IC and UIC) in this section.Table 4 shows the results of the performance comparison experiments on the Movielens 1M dataset.

Fairness performance
Our method achieves the best performance for all fairness metrics in three cold start scenarios (i.e., UC, IC, and UIC).In detail, FaRM outperforms NFCF by 3.6%, 5% and Random by 2%, 0.5% on Macro-F and Accuracy in UC scenarios with new users and existing items.FaRM significantly improves fairness performance compared to other methods in both IC and UIC scenarios.These results show that FaRM contributes significantly to improving fairness in the cold-start states.On the other hand, in the warm start scenario (i.e., WS), NFCF outperforms FaRM on Accuracy, but FaRM shows the best fairness performance on Macro-F.In particular, the experimental results show that FaRM performs much better than Random in Macro-F.Even though Random is a strong baseline, FaRM has a higher fairness performance than Random in most scenarios because the data distribution is unfair, as shown in Table 2. Random determines the fairness of the recommendation result according to whether the training data distribution is fair or unfair.In contrast, our model achieves higher fairness performance than

Relevance performance
MetaHIN showed the best performance in all states for the relevance metric, NDCG@5, while the proposed method in all cold states outperformed MetaHIN on MAE.Furthermore, our method showed significantly higher performance on NDCG@5 than two fairness-aware models in all four scenarios and the best performance on MAE in all cold states.NFCF achieves the best performance on MAE in the warm-start state (WS), while it achieves similar to or lower relevance performance than Random in three cold-start environments (i.e., UC, IC, and UIC).These results show that NFCF performs poorly in cold-start scenarios because it is a warm-start model that does not consider new users or items.In contrast, the proposed method achieves the highest performance on MAE in three cold-start environments (i.e., UC, IC, and UIC).Our fairness-aware model recommends the most widespread war movies, even those who do not like war movies.Therefore, FaRM eliminates bias while improving the relevance performance and increases MAE performance by minimizing overfitting.This experiment shows that FaRM is suitable for reducing loss of relevance performance while increasing fairness performance in cold start states.Thus, FaRM significantly improves fairness performance by minimizing the trade-off between relevance and fairness.

Model analysis
We analyze the fairness performance of each component of FaRM in the user-item cold start environment (UIC).In Fig. 6, it is shown fairness performance on Macro-F and Accuracy without each component of FaRM.The fairness performance of FaRM (i.e., including all components) is the highest, which means that all components of FaRM are essential.In other words, we demonstrate that all components of the proposed model play an important role in improving fairness performance.We also find that the fairness regularizer is crucial for improving fairness.This shows that the recommendation model learns fairness appropriately through the fairness regularizer.We also find that the impact on the fairness-aware random walker is quite significant.This is because the fairly generated meta-path can reduce bias for the sensitive attribute of the user.

Parameter analysis
Figures 7 and 8 show the relevance and fairness performance according to fairness weight γ in Eq. ( 6), respec- tively.The x-axis of each graph represents the hyperparameter γ and ranges from 0 to 1.In Fig. 7, it is shown that the relevance performance of FaRM decreases as the fairness weight increases.On the other hand, the performance of the Macro-F and Accuracy increase as the fairness weight increases, as shown in Fig. 8.These results show the influence of fairness weights γ on fairness performance.We find that the NDCG@5 significantly declines when the fairness weight is more than 0.6.We also find that the fairness performance does not improve significantly when the fairness weight is 0.6 or higher.Therefore, we set γ to 0.5 to minimize the trade-off between relevance and fairness performance.

Figure 2 .
Figure 2. User-item interaction matrix with warm users and cold users.u 1 and u 2 are warm users with two or more interacted items.u 3 , u 4 , and u 5 are cold users with one or less interacted items.

Figure 5 .
Figure 5.The type of Meta-paths of FaRM.(a) An example of meta-path UM.(b) An example of meta-path UMUM.
www.nature.com/scientificreports/ConclusionInthis paper, we propose a novel meta-learning-based recommendation framework to improve the fairness of recommendation models in cold-start environments.We propose a novel fair meta-paths generation algorithm and fairness regularizer and introduce joint training on relevance and fairness objectives.In addition, each component of the proposed framework can be used by all models that require improving group fairness.Extensive experiments demonstrate that the proposed model outperforms state-of-the-art cold-start and fairness-aware recommendation models for relevance and fairness in various cold-start scenarios.

Figure 6 .
Figure 6.The effectiveness of each component of FaRM.Lower scores indicate better fairness.

Table 1 .
Statistics of the Movielens dataset.

Table 2 .
Gender-based statistics of movie genres in Movielens 1M dataset.

Table 3 .
Fairness-aware transition probability matrix for Movielens 1M dataset.Random by corresponding to the distribution by genre regardless of raw data distribution.These results imply that FaRM can generally improve fairness in all scenarios.

Table 4 .
Experimental results of relevance and fairness performance for different models in 4 scenarios.The best model is bolded, and the second-best model is the italic.